NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Securing HDF5 Plugins with Digital Signatures

https://doi.org/10.1145/3731599.3767557

Song, Glenn; Breitenfeld, Michael Scot; Byna, Suren (November 2025, ACM)

Free, publicly-accessible full text available November 15, 2026
CASSE: Targeted Threat Modeling for Data Management Libraries

https://doi.org/10.1145/3731599.3767556

Sanchez, Keegan; Byna, Suren; Lin, Zhiqiang; Mattson, David (November 2025, ACM)

Free, publicly-accessible full text available November 15, 2026
IOAgent: Democratizing Trustworthy HPC I/O Performance Diagnosis Capability via LLMs

https://doi.org/10.1109/IPDPS64566.2025.00036

Egersdoerfer, Chris; Sareen, Arnav; Bez, Jean Luca; Byna, Suren; Xu, Dongkuan DK; Dai, Dong (June 2025, IEEE)

Free, publicly-accessible full text available June 3, 2026
The Art of Sparsity: Mastering High-Dimensional Tensor Storage

https://doi.org/10.1109/IPDPSW63119.2024.00094

Dong, Bin; Wu, Kesheng; Byna, Suren (May 2024, IEEE)

Full Text Available
PROV-IO+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems

https://doi.org/10.1109/TPDS.2024.3374555

Han, Runzhou; Zheng, Mai; Byna, Suren; Tang, Houjun; Dong, Bin; Dai, Dong; Chen, Yong; Kim, Dongkyun; Hassoun, Joseph; Thorsley, David (May 2024, IEEE Transactions on Parallel and Distributed Systems)

Full Text Available
Runway: In-transit Data Compression on Heterogeneous HPC Systems

https://doi.org/10.1109/CCGrid57682.2023.00030

Ravi, John; Byna, Suren; Becchi, Michela (May 2023, 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid))

To alleviate bottlenecks in storing and accessing data on high-performance computing (HPC) systems, I/O libraries are enabling computation while data is in-transit, such as HDFS filters. For scientific applications that commonly use floating-point data, error-bounded lossy compression methods are a critical technique to significantly reduce the storage and bandwidth requirements. Thus far, deciding when and where to schedule in-transit data transformations, such as compression, has been outside the scope of I/O libraries. In this paper, we introduce Runway, a runtime framework that enables computation on in-transit data with an object storage abstraction. Runway is designed to be extensible to execute user-defined functions at runtime. In this effort, we focus on studying methods to offload data compression operations to available processing units based on latency and throughput. We compare the performance of running compression on multi-core CPUs, as well as offloading it to a GPU and a Data Processing Unit (DPU). We implement a state-of-the-art error-bounded lossy compression algorithm, SZ3, as a Runway function with a variant optimized for DPUs. We propose dynamic modeling to guide scheduling decisions for in-transit data compression. We evaluate Runway using four scientific datasets from the SDRBench benchmark suite on a the Perlmutter supercomputer at NERSC.
more » « less
Full Text Available
Evaluating Asynchronous Parallel I/O on HPC Systems

https://doi.org/10.1109/IPDPS54959.2023.00030

Ravi, John; Byna, Suren; Koziol, Quincey; Tang, Houjun; Becchi, Michela (May 2023, 10.1109/IPDPS54959.2023.00030)

Parallel I/O is an effective method to optimize data movement between memory and storage for many scientific applications. Poor performance of traditional disk-based file systems has led to the design of I/O libraries which take advantage of faster memory layers, such as on-node memory, present in high-performance computing (HPC) systems. By allowing caching and prefetching of data for applications alternating computation and I/O phases, a faster memory layer also provides opportunities for hiding the latency of I/O phases by overlapping them with computation phases, a technique called asynchronous I/O. Since asynchronous parallel I/O in HPC systems is still in the initial stages of development, there hasn't been a systematic study of the factors affecting its performance.In this paper, we perform a systematic study of various factors affecting the performance and efficacy of asynchronous I/O, we develop a performance model to estimate the aggregate I/O bandwidth achievable by iterative applications using synchronous and asynchronous I/O based on past observations, and we evaluate the performance of the recently developed asynchronous I/O feature of a parallel I/O library (HDF5) using benchmarks and real-world science applications. Our study covers parallel file systems on two large-scale HPC systems: Summit and Cori, the former with a GPFS storage and the latter with a Lustre parallel file system.
more » « less
Full Text Available
Efficient Asynchronous I/O with Request Merging

https://doi.org/10.1109/IPDPSW59300.2023.00107

Chowdhury, Md Kamal; Tang, Houjun; Bez, Jean Luca; Bangalore, Purushotham V.; Byna, Suren (May 2023, IEEE)

Full Text Available
Accelerating Parallel Write via Deeply Integrating Predictive Lossy Compression with HDF5

https://doi.org/10.1109/SC41404.2022.00066

Jin, Sian; Tao, Dingwen; Tang, Houjun; Di, Sheng; Byna, Suren; Lukic, Zarija; Cappello, Franck (November 2022, SC22: International Conference for High Performance Computing, Networking, Storage and Analysis)

Full Text Available
Accelerating Parallel Write via Deeply Integrating Predictive Lossy Compression with HDF5

Jin, Sian; Tao, Dingwen; Tang, Houjun; Di, Sheng; Byna, Suren; Lukic, Zarija; Cappello, Franck (November 2022, SC22: International Conference for High Performance Computing, Networking, Storage and Analysis)

Lossy compression is one of the most efficient solutions to reduce storage overhead and improve I/O performance for HPC applications. However, existing parallel I/O libraries cannot fully utilize lossy compression to accelerate parallel write due to the lack of deep understanding on compression-write performance. To this end, we propose to deeply integrate predictive lossy compression with HDF5 to significantly improve the parallel-write performance. Specifically, we propose analytical models to predict the time of compression and parallel write before the actual compression to enable compression-write overlapping. We also introduce an extra space in the process to handle possible data overflows resulting from prediction uncertainty in compression ratios. Moreover, we propose an optimization to reorder the compression tasks to increase the overlapping efficiency. Experiments with up to 4,096 cores from Summit show that our solution improves the write performance by up to 4.5× and 2.9× over the non-compression and lossy compression solutions, respectively, with only 1.5% storage overhead (compared to original data) on two real-world HPC applications.
more » « less
Full Text Available

« Prev Next »

Search for: All records